A Non-Uniform Cache Architecture on Networks-on-Chip: A Fully Associative Approach with Pre-Promotion
نویسندگان
چکیده
Global interconnect becomes the delay bottleneck in microprocessor designs, and latency for large on-chip caches will be intolerable in deep submicron technologies. The recently-proposed Non-Uniform Cache Architectures (NUCAs) exploit the variation in access time across subarrays to reduce typical latency. In the dynamic NUCA (D-NUCA) design, a set-associative structure is selected and thus the flexibility of data placement and replacement is limited. This paper investigates one of the unexplored design space; a fully associative approach. In addition, we propose a pre-promotion technique to reduce the number of incremental search in the distributed cache banks. We show that, compared with a traditional multi-level cache, up to 110% improvement in IPC is achieved at 30nm.
منابع مشابه
Cost-aware Topology Customization of Mesh-based Networks-on-Chip
Nowadays, the growing demand for supporting multiple applications causes to use multiple IPs onto the chip. In fact, finding truly scalable communication architecture will be a critical concern. To this end, the Networks-on-Chip (NoC) paradigm has emerged as a promising solution to on-chip communication challenges within the silicon-based electronics. Many of today’s NoC architectures are based...
متن کاملAn Efficient Data Access Policy in shared Last Level Cache
Future multi-core systems will execute massive memory intensive applications with significant data sharing. On chip memory latency further increases as more cores are added since diameter of most on chip networks increases with increase in number of cores, which makes it difficult to implement caches with single uniform access latency, leading to non-uniform cache architectures (NUCA). Data mov...
متن کاملA Review of STT-RAM, SRAM, and eDRAM and Methods of Optimization for Computer Architecture
The following Capstone Report seeks to outline difference in cache designs but more thoroughly into the computer architecture of STT-RAM, SRAM, eDRAM. It begins by outlining the use of cache followed by the different protocols for implementation including: direct mapping, fully associative, and set associative configurations. An area of interest in this study is implementing STT-RAM over SRAM b...
متن کاملCooling the Hot Sets: Improved Space Utilization in Large Caches via Dynamic Set Balancing
Multi-megabyte on-chip last-level caches are commonplace in high-end computing platforms. Even though these caches are often designed to have very high associativity, they suffer from non-uniform utilization of the sets leading to a high volume of conflict misses. Clustering of physical addresses to a few hot sets happens partly due to poor locality in the access stream and partly due to a mism...
متن کاملLow-Power L2 Cache Architecture for Multiprocessor System on Chip Design
Significant portion of cache energy in a highly associative cache is consumed during tag comparison. In this paper tag comparison is carried out by predicting both cache hit and cache miss using multistep tag comparison method. A partially tagged bloom filter is used for cache miss predictions by checking the non-membership of the addresses and hotline check for cache hit prediction by reducing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004